A History-based Estimation for LHCb job requirements
نویسنده
چکیده
The main goal of a Workload Management System (WMS) is to find and allocate resources for the given tasks. The more and better job information the WMS receives, the easier will be to accomplish its task, which directly translates into higher utilization of resources. Traditionally, the information associated with each job, like expected runtime, is defined beforehand by the Production Manager in best case and fixed arbitrary values by default. In the case of LHCb’s Workload Management System no mechanisms are provided which automate the estimation of job requirements. As a result, much more CPU time is normally requested than actually needed. Particularly, in the context of multicore jobs this presents a major problem, since singleand multicore jobs shall share the same resources. Consequently, grid sites need to rely on estimations given by the VOs in order to not decrease the utilization of their worker nodes when making multicore job slots available. The main reason for going to multicore jobs is the reduction of the overall memory footprint. Therefore, it also needs to be studied how memory consumption of jobs can be estimated. A detailed workload analysis of past LHCb jobs is presented. It includes a study of job features and their correlation with runtime and memory consumption. Following the features, a supervised learning algorithm is developed based on a history based prediction. The aim is to learn over time how jobs’ runtime and memory evolve influenced due to changes in experiment conditions and software versions. It will be shown that estimation can be notably improved if experiment conditions are taken into account.
منابع مشابه
Optimisation of LHCb Applications for Multi- and Manycore Job Submission
The Worldwide LHC Computing Grid (WLCG) is the largest Computing Grid and is used by all Large Hadron Collider experiments in order to process their recorded data. It provides approximately 400k cores and storages. Nowadays, most of the resources consist of multiand manycore processors. Conditions at the Large Hadron Collider experiments will change and much larger workloads and jobs consuming ...
متن کاملDisk storage management for LHCb based on Data Popularity estimator
This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents ho...
متن کاملRelevance of Public Health BSc Curriculum to Job Requirements and Health System Expectations: Views of Graduates on Courses Syllabi and Content
Introduction: Public health experts play an important role in implementation of health centers’ programs. The comments and suggestions of graduate employees of this field about the level of functionality of the curriculum to professional requirements needed by graduates could help determine the shortcomings. This study aimed to investigate the level of coordination of curriculum and syllabus of...
متن کاملStudy of a solution with COTS for the LHCb calorimeter upgrade
We present a solution made out of Components Out of Shelf (COTS) for the analog processing of the signal of the LHCb calorimeters in the framework of the foreseen upgrade of the detector. The present proposal is based on the current functional solution, yet, to meet the stringent noise requirements, a number of modifications are proposed. Preliminary results on the prototype boards show promisi...
متن کاملDIRAC Lightweight Information and Monitoring Services using XML-RPC and Instant Messaging
This paper presents recent work on a scalable, lightweight approach for distributed information and monitoring systems done for the LHCb experiment’s DIRAC grid software package. Two complementary systems are presented, one based on a layered, DNS-like information service, and the other a monitoring mechanism using instant messaging for ad hoc networks which are formed in a grid environment. Th...
متن کامل